Non-Parametric Spam Filtering based on kNN and LSA

نویسندگان

  • Preslav Ivanov Nakov
  • Panayot Markov Dobrikov
چکیده

The paper proposes a non-parametric approach to filtering of unsolicited commercial e-mail messages, also known as spam. The email messages text is represented as an LSA vector, which is then fed into a kNN classifier. The method shows a high accuracy on a collection of recent personal email messages. Tests on the standard LINGSPAM collection achieve an accuracy of over 99.65%, which is an improvement on the best-published results to date.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using cellular automata for improving knn based spam filtering

As rapid growth over the Internet nowadays, electronic mail (e-mails) has become a popular communication tool. However, junk mail also, known as spam has increasingly become a part of life for users as well as internet service providers. To address this problem, many solutions have been proposed in the last decade. Currently, content-based anti-spam filtering methods are an important issue; the...

متن کامل

Application of Refined LSA and MD5 Algorithms in Spam Filtering

The paper proposes a spam filtering method that uses integrated and refined Latent Semantic Analysis (LSA) and Message-Digest Algorithm 5 (MD5) algorithms to address a series of universal problems in spam filtering, including remarkably lowered filtering precision and notably unbalanced filtering efficiency as a result of lack of latent semantic analysis of mail contents. In introducing LSA, it...

متن کامل

A Novel Method for Detecting Spam Email using KNN Classification with Spearman Correlation as Distance Measure

E-mail is the most prevalent methods for correspondence because of its availability, quick message exchange and low sending cost. Spam mail appears as a serious issue influencing this application today's internet. Spam may contain suspicious URL’s, or may ask for financial information as money exchange information or credit card details. Here comes the scope of filtering spam from legitimate em...

متن کامل

York University at TREC 2005: SPAM Track

We propose a variant of the k-nearest neighbor classification method, called instance-weighted k-nearest neighbor method, for adaptive spam filtering. The method assigns two weights, distance weight and correctness weight, to a training instance, and makes use of the two weights when classifying a new email. The correctness weight is also used in the maintenance of the training data to make the...

متن کامل

Content-Based Spam Filtering on Video Sharing Social Networks

In this work we are concerned with the detection of spam in video sharing social networks. Specifically, we investigate how much visual content-based analysis can aid in detecting spam in videos. This is a very challenging task, because of the high-level semantic concepts involved; of the assorted nature of social networks, preventing the use of constrained a priori information; and, what is pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005